feat(sinker): Ship 8 — NV12 / NV21 RGBA via const-ALPHA template#17
feat(sinker): Ship 8 — NV12 / NV21 RGBA via const-ALPHA template#17
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
Adds sink-side RGBA output support for the 8‑bit 4:2:0 semi‑planar formats NV12 and NV21, reusing the const-generic ALPHA template approach introduced previously and wiring it through MixedSinker plus the row-kernel dispatch layer and backend kernels.
Changes:
- Add
MixedSinker::<Nv12>::with_rgba/set_rgbaandMixedSinker::<Nv21>::with_rgba/set_rgba, and run native NV12/NV21→RGBA kernels duringprocess. - Add public dispatchers
row::nv12_to_rgba_rowandrow::nv21_to_rgba_row. - Refactor NV12/NV21 scalar + SIMD kernels to a shared
<const SWAP_UV: bool, const ALPHA: bool>implementation, plus add/extend equivalence tests.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/sinker/mixed.rs | Adds NV12/NV21 with_rgba/set_rgba APIs, wires RGBA kernel calls into process, updates compile-fail doctest, adds format-level tests. |
| src/row/scalar.rs | Adds NV12/NV21 RGBA scalar wrappers and refactors shared scalar kernel to <SWAP_UV, ALPHA>. |
| src/row/mod.rs | Introduces public NV12/NV21→RGBA row dispatchers with SIMD selection and scalar fallback. |
| src/row/arch/neon.rs | Refactors NEON NV12/NV21 kernel to <SWAP_UV, ALPHA> and adds RGBA wrappers + tests. |
| src/row/arch/x86_sse41.rs | Refactors SSE4.1 NV12/NV21 kernel to <SWAP_UV, ALPHA> and adds RGBA wrappers + tests. |
| src/row/arch/x86_avx2.rs | Refactors AVX2 NV12/NV21 kernel to <SWAP_UV, ALPHA> and adds RGBA wrappers + tests. |
| src/row/arch/x86_avx512.rs | Refactors AVX-512 NV12/NV21 kernel to <SWAP_UV, ALPHA> and adds RGBA wrappers + tests. |
| src/row/arch/wasm_simd128.rs | Refactors wasm simd128 NV12/NV21 kernel to <SWAP_UV, ALPHA> and adds RGBA wrappers + tests. |
Comments suppressed due to low confidence (1)
src/row/arch/x86_avx2.rs:1533
- These
unsafeNV12/NV21 wrapper functions (and the RGBA wrappers below) lack a/// # Safetysection, while otherunsafewrappers in this file document their contracts. Please add a Safety section (or forward to the shared impl’s Safety contract) stating the AVX2 feature requirement plus width/slice length invariants to prevent accidental UB by direct callers.
/// AVX2 NV12 → packed RGB. Thin wrapper over
/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with
/// `SWAP_UV = false, ALPHA = false`.
#[inline]
#[target_feature(enable = "avx2")]
pub(crate) unsafe fn nv12_to_rgb_row(
y: &[u8],
uv_half: &[u8],
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Same as [`nv12_or_nv21_to_rgb_row_impl`]. | ||
| /// WASM simd128 NV12 → packed RGB. Thin wrapper over | ||
| /// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with | ||
| /// `SWAP_UV = false, ALPHA = false`. |
There was a problem hiding this comment.
The NV12/NV21 unsafe wrapper functions here (and the new RGBA wrappers) are missing a /// # Safety section, while other unsafe wrappers in this file include one. Please add a Safety section (or forward to the shared impl’s Safety contract) stating that simd128 must be enabled and that width/slice length invariants must hold, to avoid accidental UB by direct callers.
| /// `SWAP_UV = false, ALPHA = false`. | |
| /// `SWAP_UV = false, ALPHA = false`. | |
| /// | |
| /// # Safety | |
| /// | |
| /// Same safety contract as | |
| /// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`]: `simd128` must be | |
| /// enabled at compile time, and the caller must uphold that `width` | |
| /// and all slice lengths satisfy the implementation's invariants so | |
| /// that its unchecked SIMD loads/stores remain in-bounds. |
| /// Same as [`nv12_or_nv21_to_rgb_row_impl`]. | ||
| /// NEON NV12 → packed RGB (UV-ordered chroma). Thin wrapper over | ||
| /// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with | ||
| /// `SWAP_UV = false, ALPHA = false`. |
There was a problem hiding this comment.
The unsafe NV12/NV21 wrapper fns (nv12_to_rgb_row / nv21_to_rgb_row and the new RGBA variants) don't have a /// # Safety section, but similar wrappers in this module (e.g. nv24_to_rgb_row) do. Please document the Safety contract (or explicitly forward to nv12_or_nv21_to_rgb_or_rgba_row_impl’s Safety) so direct callers know they must ensure NEON availability and uphold width/slice length invariants.
| /// `SWAP_UV = false, ALPHA = false`. | |
| /// `SWAP_UV = false, ALPHA = false`. | |
| /// | |
| /// # Safety | |
| /// | |
| /// The caller must uphold the same safety contract as | |
| /// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`]: | |
| /// | |
| /// 1. **NEON must be available on the current CPU.** If this wrapper | |
| /// is called directly, the caller is responsible for ensuring that | |
| /// executing NEON instructions is valid. | |
| /// 2. `width & 1 == 0` (4:2:0/NV12 requires even width). | |
| /// 3. `y.len() >= width`. | |
| /// 4. `uv_half.len() >= width`. | |
| /// 5. `rgb_out.len() >= 3 * width`. |
| /// Same as [`nv12_or_nv21_to_rgb_row_impl`]. | ||
| /// SSE4.1 NV12 → packed RGB. Thin wrapper over | ||
| /// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with | ||
| /// `SWAP_UV = false, ALPHA = false`. |
There was a problem hiding this comment.
The new unsafe NV12/NV21 wrapper fns here (and the RGBA variants below) don't include a /// # Safety section, unlike other unsafe row wrappers in this module (e.g. yuv_420_to_* and nv24_to_rgb_row). Please add a Safety section (or explicitly forward to the shared impl’s Safety contract) so callers know they must ensure SSE4.1 is available and slices meet the length/width invariants.
| /// `SWAP_UV = false, ALPHA = false`. | |
| /// `SWAP_UV = false, ALPHA = false`. | |
| /// | |
| /// # Safety | |
| /// Callers must ensure SSE4.1 is available on the current CPU before | |
| /// invoking this function. | |
| /// | |
| /// In addition, the slice/width preconditions are the same as for | |
| /// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`]: `y` must contain at least | |
| /// `width` bytes, `uv_half` must contain enough interleaved chroma data | |
| /// for the row width, and `rgb_out` must contain at least `width * 3` | |
| /// bytes. |
| /// Same as [`nv12_or_nv21_to_rgb_row_impl`]. | ||
| /// AVX‑512 NV12 → packed RGB. Thin wrapper over | ||
| /// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with | ||
| /// `SWAP_UV = false, ALPHA = false`. |
There was a problem hiding this comment.
The unsafe NV12/NV21 wrapper fns added/updated here (including the new RGBA wrappers) don't document a /// # Safety contract, unlike many other unsafe kernels in this module. Please add a Safety section (or explicitly refer to nv12_or_nv21_to_rgb_or_rgba_row_impl’s Safety) covering the AVX-512F/BW CPU feature requirement and the width/slice length invariants.
| /// `SWAP_UV = false, ALPHA = false`. | |
| /// `SWAP_UV = false, ALPHA = false`. | |
| /// | |
| /// # Safety | |
| /// The caller must ensure that the current CPU supports `avx512f` and | |
| /// `avx512bw`. | |
| /// | |
| /// In addition, all slice-length and `width` invariants required by | |
| /// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] must hold for `y`, `uv_half`, | |
| /// and `rgb_out`. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
#18) Tranche 3 of Ship 8 sink-side RGBA. **Wiring-only PR** — no new SIMD code. The 4:2:2 formats reuse the 4:2:0 kernels from tranches 1 + 2: `Yuv422p` calls `yuv_420_to_rgba_row` (already shipped in PR #16), and `Nv16` calls `nv12_to_rgba_row` (already shipped in PR #17). 4:2:2's per-row contract is identical to 4:2:0's — half-width chroma, one pair per Y pair — so the same kernel handles both with no changes.
) Tranche 4b of Ship 8 sink-side RGBA. Adds `Nv24` / `Nv42` (semi-planar 4:4:4) RGBA output via the dual-const-generic `<SWAP_UV, ALPHA>` template established by PR #17 (NV12 / NV21), and **retro-applies a Strategy A combined RGB→RGBA fan-out to all 8 wired families** so callers attaching both `with_rgb` and `with_rgba` no longer pay the per-pixel YUV→RGB math twice — addresses the Copilot review finding from PR #19 (`src/sinker/mixed.rs:1648`).
Tranche 2 of Ship 8 sink-side RGBA. Mass-applies the const-generic-ALPHA template proven by PR #16 to the 4:2:0 semi-planar family — NV12 (UV-ordered, FFmpeg's
nv12, the most common HW-decoder output across CUDA / NVDEC / VideoToolbox / VAAPI / Android MediaCodec / QSV) and NV21 (VU-ordered, Android camera default).Same shape as PR 1: kernel const-generic refactor + format-specific
with_rgbaimpl block + MixedSinker wiring + per-backend equivalence tests + format-level tests. No new SIMD code outside the kernel — thewrite_rgba_*helpers shipped with PR 1 are reused.Scope
Sink-side only, default opaque alpha (
0xFF). Source-side YUVA is Ship 8b (separate follow-up). Per the tracker indocs/color-conversion-functions.md§ Ship 8, this is tranche 2 of 7+:Yuv420pNv12,Nv21Yuv422p,Nv16Yuv444p,Nv24,Nv42,Yuv440pYuv420p9/10/12/14/16,P010/P012/P016Yuv422p9/10/12/14/16,Yuv440p10/12,P210/P212/P216Yuv444p9/10/12/14/16,P410/P412/P416Usage:
MixedSinker::<Nv21>works identically withNv21Frame+nv21_to.What's in this PR
Public API
MixedSinker<Nv12>::with_rgba(&mut [u8])/set_rgba— format-specific impl block, mirrors the Yuv420p pattern.MixedSinker<Nv21>::with_rgba(&mut [u8])/set_rgba— same.row::nv12_to_rgba_row(...)androw::nv21_to_rgba_row(...)— public dispatchers paralleling the RGB variants.Kernel work
The shared NV12/NV21 kernel was already const-generic on
<const SWAP_UV: bool>(NV12 = false, NV21 = true). This PR adds a second const generic<const ALPHA: bool>:row/scalar.rsnv12_or_nv21_to_rgb_or_rgba_row_impl<SWAP_UV, ALPHA>row/arch/neon.rsvst4q_u8whenALPHA(native AArch64 4-channel store)row/arch/x86_sse41.rswrite_rgba_16from PR 1row/arch/x86_avx2.rswrite_rgba_32from PR 1row/arch/x86_avx512.rswrite_rgba_64from PR 1row/arch/wasm_simd128.rswrite_rgba_16from PR 1Each kernel has 4 wrappers now (
nv12_to_rgb_row,nv12_to_rgba_row,nv21_to_rgb_row,nv21_to_rgba_row) thinning to the same monomorphized template. The compiler DCE's the unused branches at every call site.MixedSinker integration
RGBA runs as an independent kernel call (not compose) — same as Yuv420p in PR 1. Default alpha =
0xFFsince neither NV12 nor NV21 carries an alpha plane.Doc updates
docs/color-conversion-functions.md§ Ship 8:Ship 8bsource-side YUVA to a separate follow-up.Tests
+14 lib tests, total 443 (was 429):
Per-backend tests bypass the dispatcher — call each backend's
unsafe nv12_to_rgba_row/nv21_to_rgba_rowdirectly under runtime feature detection. So on AVX-512-capable CI runners, all three x86 paths run; the existing CI matrix (avx512SDE + AVX2-max + SSE4.1-max + scalar tarpaulin tiers) covers every backend.Compile-fail doctest update — PR 1's doctest used
MixedSinker::<Nv12>::with_rgbaas the negative example. Since NV12 now has RGBA wired, the negative example moves toMixedSinker::<Nv16>(tranche 3, not yet wired). Each future tranche moves it forward similarly.Local results (aarch64 macOS): 443 lib tests + 1 doctest pass; wasm32 + x86_64 cross-targets compile clean.
What's deferred
with_rgba_u16ships in tranches 5–7 (whereu16output exists).nv12_to_rgbavs.nv12_to_rgb. Deferred until enough tranches have shipped to make a comparative table meaningful.Test plan
test,test-sde-avx512,cross,coverage,clippy,buildjobs.colconv_disable_*rustflags.cargo doc --lib --no-depsclean (no new doc warnings vs. main).🤖 Generated with Claude Code